2 research outputs found
A configurable vector processor for accelerating speech coding algorithms
The growing demand for voice-over-packer (VoIP) services and multimedia-rich
applications has made increasingly important the efficient, real-time implementation of
low-bit rates speech coders on embedded VLSI platforms. Such speech coders are
designed to substantially reduce the bandwidth requirements thus enabling dense multichannel
gateways in small form factor. This however comes at a high computational cost
which mandates the use of very high performance embedded processors.
This thesis investigates the potential acceleration of two major ITU-T speech coding
algorithms, namely G.729A and G.723.1, through their efficient implementation on a
configurable extensible vector embedded CPU architecture. New scalar and vector ISAs
were introduced which resulted in up to 80% reduction in the dynamic instruction count
of both workloads. These instructions were subsequently encapsulated into a parametric,
hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research
and implementation of the vector datapath of this vector coprocessor which is tightly-coupled
to a Sparc-V8 compliant CPU, the optimization and simulation methodologies
employed and the use of Electronic System Level (ESL) techniques to rapidly design
SIMD datapaths
Architecture, performance modeling and VLSI implementation methodologies for ASIC vector processors: a case study in telephony workloads
This research discusses hardware architectures, script-based automation and software and hardware methodologies for developing customized System-on-Chip scalar/vector processors within the example application domain of telephony codes. The approaches researched include Register-Transfer-Level methodologies resulting in an SIMD-enhanced processor known as the ITU-VE1, and Electronic System Level methodologies resulting in a multi-parallel vector processor known as the SS-SPARC. The example applications were the ITU-T G.729A and G.723.1 speech codecs chosen for their abundant data-level parallelism and availability for research purposes. Results indicate the proposed scalar/vector accelerators achieve a maximum speed-up of 4.27 and 4.62 for the G729.A and G723.1 encoders respectively for 512-bit wide SIMD configurations. Both vector processors resulting from the proposed methodologies were implemented as VLSI macros and compared at the silicon level. Compared to the Register-Transfer-Level flow, the Electronic System Level flow implementing the same datapath results in increased power consumption of 3-15% however delivers an area reduction of 2-18% and substantially shortens design and verification time making it a viable alternative to established RTL methodologies